要真正掌握 Git,就必须深入到 表面操作 的用户友好界面之下,理解 底层机制——管理 Git 内部对象数据库的底层引擎。该数据库是一个内容寻址文件系统,每一块数据都以不可变对象的形式存储。
1. 表面操作与底层机制
表面操作 指高层级命令(如 git status),专为人类交互设计。 底层机制 指直接操作 Git 对象数据库的低层级命令,使我们能够访问 Git 真正的内部结构。
2. 对象数据库
Git 在 .git/objects 目录中作为键值存储运行,存放内部对象: blob、 树、 提交和 标签。虽然分支常被提及与对象相关,但它们实际上只是对这些提交的引用。
3. SHA-1 地址机制
每个对象都由一个唯一的 40 位十六进制 SHA-1 校验和命名。Git 通过将前两位字符用作子目录名(例如 af/)来优化存储,并将剩余的 38 位作为文件名。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
Which of the following best describes 'Porcelain' commands in Git?
Low-level commands for manual database manipulation.
High-level commands meant for day-to-day user interaction.
A specific type of encrypted SSH key.
The process of garbage collection.
✅ Correct!
Porcelain commands like 'git status' and 'git commit' are the user-friendly interface we use daily.❌ Incorrect
Low-level commands are known as 'Plumbing'; Porcelain is the high-level layer.QUESTION 2
Where does Git store its internal object database?
.git/config
.git/objects
.git/refs
.git/hooks
✅ Correct!
The '.git/objects' folder is the content-addressable store for all blobs, trees, and commits.❌ Incorrect
The '.git/config' file stores settings; objects are stored in '.git/objects'.QUESTION 3
Which internal Git object represents the content of a single file?
Tree
Blob
Commit
Tag
✅ Correct!
Blobs (Binary Large Objects) store the raw content of files without filenames.❌ Incorrect
Trees represent directory structures; Blobs represent the file content itself.QUESTION 4
How does Git create the 40-character identifiers for its objects?
Using a random number generator.
Using SHA-1 checksums of the object's content.
By sequential numbering of commits.
Using the timestamp of the operation.
✅ Correct!
SHA-1 ensures that every object’s contents is never corrupted without Git knowing about it.❌ Incorrect
Git uses deterministic hashing (SHA-1), not random or sequential numbers.QUESTION 5
What happens at the plumbing level when you run 'git status'?
It simply reads a single text file labeled 'status.txt'.
It compares the working directory against the 'Object Web' (blobs and trees).
It deletes all untracked files automatically.
It re-encrypts the entire repository.
✅ Correct!
Git traverses the tree and blob objects to determine differences between states.❌ Incorrect
Git is performing a complex plumbing operation behind the scenes, comparing the index and object database.Case Study: Investigating the Object Store
Understanding File System Architecture in Git
To understand how Git maintains data integrity and efficiency, we must examine the physical storage on disk. Since every piece of data (blobs, trees, tags, and commits) is immutable, Git must organize these files in a way that remains performant even with thousands of objects. This foundational structure relies on the SHA-1 hashing algorithm to 'name' files.
Q
In your my-git-repo repository, open the folder .git/objects and identify how Git stores objects using SHA-1 checksums. (Word count requirement: 25 words minimum).
Solution:
Git calculates a 40-character SHA-1 hash for each object. It uses the first two characters to create a subdirectory (e.g., /af/) and uses the remaining 38 characters as the filename within that folder. This structure prevents thousands of files from cluttering a single directory, ensuring the file system remains efficient.
Git calculates a 40-character SHA-1 hash for each object. It uses the first two characters to create a subdirectory (e.g., /af/) and uses the remaining 38 characters as the filename within that folder. This structure prevents thousands of files from cluttering a single directory, ensuring the file system remains efficient.
Q
Why is it impossible to change a file in Git's history without changing its SHA-1 hash?
Solution:
Because the hash is calculated directly from the content. This makes Git a content-addressable filesystem; any change to even a single bit in a blob results in a completely different SHA-1 ID, ensuring total data integrity.
Because the hash is calculated directly from the content. This makes Git a content-addressable filesystem; any change to even a single bit in a blob results in a completely different SHA-1 ID, ensuring total data integrity.